OpenAI and Anthropic Conduct First Cross-Company AI Safety Review
OpenAI and Anthropic have completed a landmark joint safety review of their respective AI systems, marking the first time rival firms have subjected their models to external scrutiny by a direct competitor. The study, concluded on August 27, 2025, evaluated four critical areas: adherence to step-by-step instructions, resistance to jailbreak attempts, frequency of false outputs, and signs of hidden intent.
Anthropic's Claude models demonstrated robust performance in executing complex instructions and preventing prompt leaks, with a 70% refusal rate when outputs risked inaccuracy. OpenAI's models exhibited lower refusal rates but higher instances of false results, though they showed greater resilience in certain stress tests. The collaboration emerges amid internal reports highlighting unexplained hallucination behaviors and shutdown bypass risks in AI systems.